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Abstract 

We continue the study of constructing explicit extractors for independent general weak ran¬ 
dom sources. The ultimate goal is to give a construction that matches what is given by the 
probabilistic method — an extractor for two independent n-bit weak random sources with min- 
entropy as small as logn + 0(1). Previously, the best known result in the two-source case is 
an extractor by Bourgain [Bou05], which works for min-entropy 0.49n; and the best known 
result in the general case is an earlier work of the author [Lil3a], which gives an extractor for a 
constant number of independent sources with min-entropy polylog(rc). However, the constant in 
the construction of [Lil3a] depends on the hidden constant in the best known seeded extractor, 
and can be large; moreover the error in that construction is only 1/poly (n). 

In this paper, we make two important improvements over the result in [Lil3a]. First, we 
construct an explicit extractor for three independent sources on n bits with min-entropy k > 
polylog(n). In fact, our extractor works for one independent source with poly-logarithmic min- 
entropy and another independent block source with two blocks each having poly-logarithmic 
min-entropy. Thus, our result is nearly optimal, and the next step would be to break the 0.49n 
barrier in two-source extractors. Second, we improve the error of the extractor from l/poly(n) 
to 2~ K , which is almost optimal and crucial for cryptographic applications. Some of the 

techniques developed here may be of independent interests. 


1 Introduction 


Randomness extractors are fundamental objects in studying the role of randomness in computa¬ 
tion. Motivated by the wide applications of randomness in computation (ranging from algorithms, 
distributed computing to cryptography and interactive proofs), the standard requirements that the 
randomness used should be uniform, and the fact that real world random sources are almost always 
biased and defective, randomness extractors are functions that transform imperfect random sources 
into nearly uniform random bits. In addition, these objects are especially useful in cryptographic 
applications, since there even originally uniform random secrets can be compromised as a result of 
side channel attacks. To formally define randomness extractors, we model imperfect randomness 
as an arbitrary probability distribution with a certain amount of entropy; and we use the standard 
min-entropy to measure the randomness in such an imperfect random source. 

Definition 1.1. The min-entropy of a random variable X is 

H oa {X) = min log 2 (l/ Pr[X = x]). 

irEsupp(X) 

For X € {0, l} n , we call X an (n, Hoo(X))-source, and we say X has entropy rate H oa (X)/n. 

Ideally, one would hope to construct a deterministic extractor that works for any imperfect 
random source with a certain amount of min-entropy. However, it is easy to show that this is an 
impossible task. Thus the study of randomness extractors has taken two different approaches. 

The first is to give the extractor an additional independent uniform random string (i.e., make 
the extractor probabilistic). These extractors are called seeded extractors and were introduced by 
Nisan and Zuckerman [NZ96] . The formal definition is given below. 

Definition 1.2. (Seeded Extractor) A function Ext : {0, l} n x {0, l} d —>• {0, l} m is a (k, e) -extractor 
if for every source X with min-entropy k and independent Y which is uniform on {0, l} d , 

|Ext(A, Y) — U rn \ < e. 

It is a strong (k,e)-extractor if in addition we have 

|(Ext(X,y),y) - (U m ,Y)\ < e, 
where | • | denotes the statistical distance. 

One can show that with a very small amount of additional random bits (called seed, and 
typically of length say d = O(logra)), it is possible to construct extractors for all weak random 
sources. Moreover, even without the auxiliary uniform random bits, these extractors can be used 
in many applications (such as simulating randomized algorithms using weak random sources) just 
by trying all possible values of the seed. Seeded extractors have also been found to be related to 
many other areas in computer science, and today we have nearly optimal constructions of such 
extractors (e.g., [LRVW03, GUV09, DW08, DKSS09]). 

However, seeded extractors are not enough for many other important applications, most notably 
the ones in distributed computing and cryptography, where the trick of trying all possible values 
of the seed does not work. Instead, in these applications we need extractors without the uniform 
random seed. These extractors are called seedless extractors. Given that it is impossible to build 
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extractors that use just a single weak random source, one natural alternative is to try to build 
extractors that use multiple independent weak random sources. Indeed, it seems reasonable to 
assume that we can find more than one independent weak sources in nature, such as stock market, 
thermal noise, computer mouse movements and so on. Such extractors are called independent 
source extractors. A formal definition is given below. 

Definition 1.3 (Independent Source Extractor). A function lExt : ({0, l} n )* —>• {0, l} m is an 
extractor for independent (n, k ) sources that uses t sources and outputs m bits with error e, if for 
any t independent (n, k) sources X\, • • • , X t , we have 


|IExt(Xi,X 2 , ■ • • ,X t )-U m | <e, 
where | • | denotes the statistical distance. 

Constructing independent source extractors is a major problem in the area of pseudorandomness, 
and has been studied for a long time. Indeed these extractors have been used in distributed 
computing and cryptography (e.g., the network extractor protocols in [KLRZ08, KLR09]). Here, 
one natural goal is to construct extractors that use as few number of sources as possible. For 
example, in [CG88], Chor and Goldreich showed that the well known Lindsey’s lemma gives an 
extractor for two independent (n,k) sources with k > nj 2. One can also use the probabilistic 
method to show that there exists a deterministic extractor for just two independent sources with 
logarithmic min-entropy, which is optimal since extractors for one weak source do not exist. In fact, 
the probabilistic method shows that with high probability a random function is such a two-source 
extractor. Thus, explicit constructions of independent source extractors is also closely related to 
the general problem of derandomization. 

Independent source extractors also have close connections to Ramsey graphs. For example, given 
any boolean function with two n-bit inputs, one can construct a bipartite graph with N = 2 n vertices 
on each side, such that two vertices are connected if and only if the output is 1. If the function is 
a two-source extractor for (n, k ) sources, then the resulted bipartite graph has no bipartite clique 
or independent set of size K = 2 k (i.e., a Ramsey graph). With some extra efforts, this bipartite 
Ramsey graph can also be converted to a regular Ramsey graph. More generally, extractors that 
use a few (say a constant) number of sources give Ramsey hypergraphs. 

Finally, independent source extractors are also quite useful in constructing seedless extractors 
for other structured sources, because in many cases other structured sources can be reduced to 
independent sources. Two such examples are the constructions of extractors for affine sources in 
[Lillb] and extractors for small space sources in [KRVZ06]. 

However, despite considerable efforts spent on independent source extractors, the known con¬ 
structions of two-source extractors are far from optimal. To date the best known two-source ex¬ 
tractor due to Bourgain [Bou05], only works for entropy k > (1/2 — 8)n for some small universal 
constant 5 > 0. Quantitatively, this is just a slight improvement over the result by Chor and Gol¬ 
dreich [CG88]. Given the difficulty of constructing better two-source extractors, researchers have 
turned to the alternative approach of constructing extractors that use a few more weak random 
sources, and ideally ones that only use a constant number of sources. 

This approach has been quite fruitful, starting from the work of Barak, Impagliazzo and Wigder- 
son [BIW04], who applied techniques from additive combinatorics to show how to extract from a 
constant number (poly(l/d)) of independent ( n,5n ) sources, for any constant <5 > 0. Following 
this work, by using more involved techniques, Barak et al. [BKS + 05] constructed extractors for 
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three independent (n, 5n) sources for any constant <5 > 0. This was later improved by Raz [Raz05] 
to given an extractor that works for three independent sources where only one is required to be 
an ( n,5n ) source while the other two can have entropy as small as k > polylog(n). In the same 
paper Raz also gave an extractor for two independent sources where one is required to have entropy 
k > (1/2 + 5)n for any constant 5 > 0, and the other can have entropy as small as k > polylog(n). 
Most of these work use advanced techniques in additive combinatorics, such as sum-product theo¬ 
rems and incidence theorems. However, these results only achieve a constant number of sources if 
at least one source has min-entropy 5n for any constant 5 > 0. 

By using clever ideas related to somewhere random sources, Rao [Rao06] and subsequently 
Barak et al. [BRSW06] constructed extractors for general (n,k) sources that use 0(log n/ log k) 
independent sources. In particular, these results give extractors that only use a constant number 
of sources even if the min-entropy is n s for any constant 5 > 0. They are thus a big improvement 
over previous results. Based on these techniques, in [Lilia] the author gave an extractor for three 
independent (n, k) sources with k > n 1 / 2 ”^ for any constant 5 > 0. However, in the worst case where 
k = polylog(n), the number of sources required is still super-constant (i.e., O (log n/ log log n)). 

In a recent breakthrough [Lil3b, Lil3a], the author further exploited the properties of some¬ 
where random sources and established a connection between extraction from such sources and the 
problem of leader election in distributed computing. Based on this connection, the author man¬ 
aged to construct the first explicit extractor that uses only a constant number of sources even if 
the entropy is as small as polylog(n) [Lil3a]. More specifically, for any constant rj > 0, the result 
gives an explicit extractor for min-entropy k > log 2+v n that uses O(^) + 0(1) independent (n,k) 
sources. This is the first explicit independent source extractor that comes close to optimal. 

However, the result in [Lil3a] still suffers from two drawbacks. First, the 0(1) term can be 
pretty large. This is because the construction first uses a seeded extractor to convert several inde¬ 
pendent (n, k ) sources into somewhere random sources (by using every possible value of the seed 
to extract from the source and then taking the concatenation), and then takes the XOR of these 
somewhere random sources to reduce the error. To ensure efficiently computability we need the 
seed length of the seeded extractor to be O(logn); while to ensure the number of sources needed 
is a constant, we need the error of the seeded extractor to be at most l/poly(n). Thus, we need 
an optimal (up to constant factors) seeded extractor in the case where the error e = 1/poly(n). 
For example, the extractor in [LRVW03] does not suffice because it is only optimal when the error 
e = exp(— log n/ log*- c ^ n), which is larger than any l/poly(n). 

Suppose we have a seeded extractor with seed length d = logn + CTog(l/e) for some constant 
C > 1, then the above XOR step needs at least C + 1 independent weak sources. One can show 
that the constant C here must be at least 2, thus even if we have truly optimal seeded extractors, 
this step requires at least 3 sources. After that we need at least one extra source to convert 
the somewhere random source into another somewhere random source with the “almost h- wise 
independent property” as in [Lil3a], and we need at least two other sources to extract nearly 
uniform random bits. Therefore, even with truly optimal seeded extractors the construction in 
[Li 13a] requires at least 6 independent sources. 

Unfortunately, currently we do not have truly optimal seeded extractors, but rather extractors 
that are optimal up to constant factors. The two known constructions of such extractors are 
[GUV09] and [DW08] (and the related [DKSS09]), both of which first apply a condenser to transform 
the weak source into a new source with entropy rate a for some constant a > 0, and then apply 
an optimal seeded extractor for such sources. However, the seeded extractors for such sources may 
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already have a big constant C in the seed length. For example, the extractor by Zuckerman [Zuc97] 
for such sources can be estimated to have C > 30, while a different construction in [GUV09] has 
even larger constant, potentially reaching C > 100. Other constructions such as the block source 
extractor used in [DW08] have similar behavior. Therefore, by using these seeded extractors, the 
0(1) term in the result of [Lil3a] can be pretty large (e.g., > 30). 

Another drawback of the result in [Lil3a] is that the construction only achieves error 1/poly (n). 
This kind of error is not enough for many cryptographic applications, where we typically need to 
have a negligible error (i.e., n - ^ 1 )). 

1.1 Our results 

In this paper, we further improve the results in [Lil3a] . We construct an explicit extractor for three 
independent sources on n bits with min-entropy k > polylog(n). In fact, our extractor works for one 
independent source with poly-logarithmic min-entropy and another independent block source with 
two blocks each having poly-logarithmic min-entropy. We also improve the error of the extractor 
from l/poly(n) to 2~ fcn(1) . Specifically, we have the following theorem. 

Theorem 1.4. For all n,k £ N with k > log 12 n, there is an efficiently computable function 
lExt : {0,1}™ x {0, l} 2n —»■ {0, l} m such that if X is an ( n,k)-source and Y = (Y\ , T 2 ) is an 
independent [k, k) block source where each block has n bits, then 

|(IExt(X, Y),Y) — (U m ,Y)\ < e 

and 


|(IExt(X,T),X) - (U m ,X)\ < e, 

where m = 0.9k and e = 2 _fcn(1) . 1 

As a corollary this immediately gives the following theorem. 

Theorem 1.5. For all n,k £ N with k > log 12 n, there is an efficiently computable three-source 
extractor lExt : ({0, l} n ) 3 —>• {0, l} m such that if X,Y,Z are three independent (n,k)-sources, then 

11 Ext (X, Y. Z) — U m \ < e, 

where m = 0.9 k and e = 2 _fcn<1) . 

If the min-entropy k is very close to log 2 n, then we also have improved results over [Lil3b] . In 
particular, we have the following theorem. 

Theorem 1.6. For every constant rj > 0 and all n,k £ N with k > log 2+7? n, there is an efficiently 
computable extractor BExt : ({0,l} Tl ) t x ({0, l} n )* —»■ {0, l} m with t = |"|] + 1, such that if X = 
(X\,X 2 , ■ ■ • Xt),Y = (V), Y' 2 , ■ ■ ■ Y t ) are two independent (k , k, ■ ■ ■ ,k)- block sources where each block 
has n bits, then 


|(BExt(X, Y), Y) — (U m , T)| < e 

1 We can show that this error is strictly n - "^. 
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and 


\(BExt(X,Y),X) - (U m ,X)\ <e, 

where m = 0.9A: and e = 2 _fcf2(1) . 

As a corollary, we immediately obtain the following theorem. 

Theorem 1.7. For every constant 77 > 0 and all n, k £ N with k > log 2+?? n, f/iere is an efficiently 
computable extractor lExt : ({0, l} n )* —>• {0, l} m with t = |"^~| +2 such that if X\. - ■ ■ , X t are t 
independent (n, k)-sources, then 


|IExt(Xi, • • • ,X t )-U m | <e, 

where m = 0.9& and e = 2 _fc ° (1) . 

For example, the above theorem gives an extractor for min-entropy k = log 3 n that uses 16 
sources, and an extractor for min-entropy k = log 4 n that uses 9 sources. 

Remark 1.8. In all theorems, the constant 0.9 can be replaced by any constant less than 1. 


Table 1 summarizes our results compared to previous constructions of independent source ex¬ 
tractors. 


Construction 

Number of Sources 

Min-Entropy 

Output 

Error 

[CG88] 

2 

k > (1/2 + 6)n, any constant 5 

O(n) 

2 —S2(n) 

[BIW04] 

poly (1/5) 

5n, any constant <5 

O(n) 

2 

[BKS+05] 

3 

5n, any constant 6 

0(1) 

O(l) 

[Raz05] 

3 

One source: 5n, any constant 6. Other 
sources may have k > polylog(n). 

0(1) 

O(l) 

[Raz05] 

2 

One source: (1/2 + 6)n, any constant 6. 
Other source may have k > poly log (n) 

Q(k) 

2 -n(k) 

[Bou05] 

2 

(1/2 — ao)n for some small universal 
constant «o > 0 

0(n) 

2~n(n) 

ZD 

O 

O 

<3 

3 

One source: 5n, any constant 6. Other 
sources may have k > polylog(n). 

Q(k) 

2 -k n W 

<0 

O 

O 

0(logn/ log k) 

k > polylog(n) 

&(k) 

k~ U ^ 

[brswo6] 

0(logn/ log k) 

k > polylog(n) 

Q(k) 

2 -k li W 

[Lilia] 

3 

k = n 1 / 2+<1) , any constant 5 

Q(k) 

k~ il ^ 

[Li 13b] 

0(log(gf)) + 0(1) 

k > polylog(n) 

e(k) 

k -n(i) 

[Lil3a] 

0(i) + 0(l), 

0(1) can be large 

k > log 2+T1 n 

Q(k) 

n~ n ^ + 
2 -fc n < 1 ) 

This work 

3 

k > log 12 n 

Q(k) 

2~k n w 

This work 

r^i +2 

k > log 2+ri n 

@(k) 

2 -fc s ki) 


Table 1: Summary of Results on Extractors for Independent Sources. 
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2 Overview of The Constructions and Techniques 


Here we give a brief overview of our constructions and the techniques. To give a clear description 
of the ideas, we shall be informal and imprecise sometimes. 

The high level idea of our constructions still follows the framework of [Lil3b, Lil3a]. Thus, we 
first briefly review the construction in [Lil3a]. 

2.1 A brief review of the construction in [Lil3a] 

The constant-source extractor in [Lil3a] works by first obtaining a somewhere random source (SR- 
source for short), which is a random N x m matrix such that at least one row of the matrix is 
uniform. In addition, the SR-source has the stronger property that say | of the rows are uniform, 
and moreover they are (almost) h -wise independent with h = k a for some constant 0 < a < 1. Once 
we have this SR-source, we can use the lightest bin protocol from [Fei99] to reduce the number of 
rows in the SR-source; while after each execution of the lightest bin protocol, we use the random 
strings in the output of the protocol as seeds to extract from another fresh weak source, using a 
strong seeded extractor. This way we can ensure that the resulted new random variable (not the 
strings from the original SR-source) is another SR-source that preserves the h -wise independent 
property (as long as the output length of the seeded extractor is small, say at most k/(2h)). On 
the other hand the number of rows in this new SR-source has decreased a lot, roughly from N to 
N 4 /vT_ 

We can thus repeat this process until the number of rows in the SR-source becomes small 
enough, say k 1 ^ 3 ; and then we can take at most two other independent (n,k) sources and use an 
extractor from [BRSW06] to extract nearly uniform random bits. Since initially the number of 
rows in the SR-source is poly(n), k > polylog(n) and h = k a , a simple calculation shows that the 
number of iterations needed is a constant. In addition, the initial SR-source can also be obtained 
from a constant number of independent (n, k) sources. Thus the total number of sources needed is 
a constant. However, as mentioned before, the step of obtaining the initial SR-source may require 
a large constant number of sources. 

2.2 The new construction 

We now describe our new construction of the three source extractor. Again, we will first obtain 
an SR-source such that say | of the rows are uniform, and moreover they are (almost) h -wise 
independent with h = k a for some constant 0 < a < 1. However, we will use just two independent 
(n,k) sources to achieve this. This is our major improvement over the construction in [Lil3a]. 
To explain the ideas, we will first show how to use three independent (n, k) sources to obtain the 
SR-source. 

2.2.1 Use three sources to obtain the /i-wise independent SR-source 

In [Lil3a], the initial SR-source with the h -wise independent property is obtained in two steps. 
First, one uses a constant number of independent (n, k) sources to obtain a random variable that 
is statistically close to an SR-source such that say | of the rows are uniform (but without the 
h -wise independent property). Then one can use a single extra independent (n, k) source to obtain 
a new SR-source with the h -wise independent property. It is the first step that uses a large 
number of independent sources. The reason is that if we take a seeded extractor with seed length 
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d = logn + CTog(l/e) for some e = l/poly(n) and convert a weak source into a somewhere (close 
to) random source by trying all possible values of the seed and then concatenating the outputs, 
then the number of rows is N = 2 d > (l/e) c . In addition, the best one can say about the close to 
uniform rows is that each one is e-close to uniform (or even worse). Thus if we want the source to 
be statistically close to an SR-source such that |lV rows are simultaneously uniform, by the union 
bound we would need the error of the close to uniform rows to be smaller than e . Thus, it takes 
the XOR of at least (7 + 1 independent sources applied with the seeded extractor to reduce the 
error to this small. 

Here we take a completely different approach. Since eventually we need the error of the close 
to uniform rows in the source (obtained by applying a seeded extractor to an (n, k ) source X and 
trying all possible values of the seed) to be small, we might as well just start with a seeded extractor 
with larger seed length, say i = 3> logn, where 0 < f3 < 1 is another constant. Now if we use 

an optimal strong seeded extractor Ext 2 such as that in [GUV09], we can indeed show that the 
error of the close to uniform rows is e = 2~^ k<3 \ which is small enough. Moreover, by a standard 
averaging argument we can show that at least 0.9 fraction of the rows are e-close to uniform. 

However, by naively doing this, we have increased the number of rows in the somewhere (close 
to) uniform source (which we will call X) to 2 l = 2 fc/3 , which is super polynomial and also much 
larger than 1/e, so it seems that we have gained nothing. Fortunately, so far we have just used 
one weak source. Thus we can take another weak source and use it to sample a subset of poly(n) 
rows from X, and hopefully with high probability conditioned on the second source, the sampled 
subset of rows still contains a large fraction of close to uniform rows. If this is true then we 
are done, since now we only have poly(n) rows and the error of each close to uniform row is 
e = < l/poly(n); so we can show that this new source is poly(n)2~ n ( fc/3) = 2 -fcS1<il - 

close to an SR-source such that say | of the rows are uniform. 

Given this idea, it is straightforward to implement it. To sample from a set of elements using 
a weak random source, it suffices to take a seeded extractor, which is equivalent to a sampler as 
shown in [Zuc97]. More specifically, take a seeded (k' = k/2,e') extractor Exti with seed length 
d = 0(logn + log(l/e')) and output length i = k^ < OAk such as that in [GUV09], we can view it 
as a bipartite graph with 2 n vertices on the left, 2 e vertices on the right, and left degree 2 d . Thus 
each vertex on the left selects a subset of right vertices with size 2 d . Now if we associate the right 
vertices with the 2^ rows in X , we can use another independent (n, k ) source Y to sample a vertex 
on the left, which gives us a subset of the rows in X with size 2 d . 

We say a row in X is “good” if it is e-close to uniform. Thus at least 0.9 fraction of the rows are 
good. A standard property of the ( k',e') seeded extractor implies that the number of left vertices 
whose induced subset of rows in X contains less than 0.9 — e' fraction of good rows, is at most 
2 k . Since Y is an (n, k) source, the probability of selecting a subset of rows which contains at 
least 0.9 — e' fraction of good rows is at least 1 — 2 k '2~ k = 1 — 2~ k / 2 . Thus it suffices to take 
e' = 1/4 and we know that with probability at least 1 — 2 -fc / 2 over Y, the selected subset of rows 
of X has at least 0.9 — 1/4 > 2/3 fraction of good rows. Moreover, since e' = 1/4 we have that 
d = 0(logn + log(l/e')) = O(logn), therefore the size of the selected subset is 2 d = poly(n). 

Note that the above sampling process is equivalent to computing Ext 2 (X, Exti(Y, r*)) for all 
possible values of the d bit seed of Exti. Thus (although we are sampling from a set of super¬ 
polynomial size) this can be done in polynomial time. Hence, we have used two independent (n, k) 
sources to obtain a new source W such that with high probability, W is statistically close to an 
SR-source which has | fraction of uniform rows. We can now take another independent source Z 
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and use the method in [Lil3a] to get an SR-source with the h-wise independent property. 

Furthermore, notice that by doing this we have reduced the error from 1/poly(n) in [Lil3a] 
to 2 _fc ° (1) . Essentially, with one source we can only obtain an SR-source with poly(n) rows such 
that some rows are 1/poly(n)-close to uniform; but with two independent sources we can obtain 
an SR-source with poly(n) rows such that some rows are 2 _fc ° (1) (or even 2~^( fc ))-close to uniform. 
In fact, this method is quite general and can be applied to any construction that involves reducing 
the error in an SR-source. For example, it can also be used to reduce the error of the extractor in 
[Rao06] from l/poly(n) to 2 _fcn<1) . On the other hand, the method used in [BRSW06] to reduce the 
error of the extractor in [Rao06] cannot be directly applied to the construction in [Li 13a], since the 
construction in [Li 13a] has a special structure (XORing several independent copies of SR-sources). 

2.2.2 Use two sources to obtain the h- wise independent SR-source 

We now describe how we can remove one source, and use just two independent (n, k) sources to 
obtain the h -wise independent SR-source. First, We also briefly review the method to generate the 
h -wise independent SR-source in [Lil3a]. Given an SR-source Y and an independent source X, we 
will use each row of Y to do several rounds of alternating extraction (cf. [DW09, Lil2, Lil5] ) from 
X. More specifically, we divide the binary expression of the index of the row of Y into blocks of 
size log h, and for each block we run an alternating extraction from X and pick an output indexed 
by that block. This output is then used to start the next round of alternating extraction. The final 
output will be the output of the alternating extraction in the last round, indexed by the last block 
of the binary expression of the index of that row (more details can be found in [Lil3a]). The new 
SR-source Z will then be the concatenation of the outputs for all rows. 

In each alternating extraction the seed length of the seeded extractor is chosen to be £ = k@, 
and one can show the following. For any subset of rows in Y with size h, if all these rows are 
uniform (but they may depend on each other arbitrarily), then with probability 1 — 2 -f ^ over the 
fixing of Y, the joint distribution of the corresponding rows in Z is 2~^^-close to uniform (i.e., Z 
has the almost h -wise independent property). 

Now, going back to our new construction. We have already used two independent sources Y 
and X to obtain an SR-source W with N = poly(n) rows, such that with probability 1 — 2 _fc//2 
over the fixing of Y. there exists a large subset T C [N] such that each row of W with index in 
T is 2 _ ^l^-dose to uniform. Moreover we will have Ext 2 output i bits so that each row in W has 
length t. We will now take another optimal seeded extractor, and then use each row of W as the 
seed to extract from Y and output k/2 bits. Let the concatenation of these outputs be Y. We will 
now think of Y as an SR-source, and X as an independent source, and use the same method in 
[Lil3a] described above to obtain the new SR-source Z from Y and X. 

We will show that with high probability over the fixing of Y, the new SR-source Z has the 
desired h -wise independent property. Note that with probability 1 — 2 _fc//2 over the fixing of Y, 
there exists a large subset T C [N] such that each row of W with index in T is 2 _fi ^-close to 
uniform. If for every y € Supp(T) that makes this happen, we can show that conditioned on Y = y, 
the new source Z also has the desired h -wise independent property in the subset T of rows then 
we are done. However, this may not be the case. Thus, we want to subtract from 1 — 2 -fc / 2 the 
probability mass of the “bad” y’s which result in a Z that does not have the h -wise independent 
property in the subset T of rows. Towards this goal, we define a bad y € Supp(U) to be a string 
that satisfies the following two properties: 


a) Conditioned on the fixing of F = y, there exists a large subset T C [N] such that 
each row of W with index in T is 2 _f2 ^-close to uniform, 

and 

b) Conditioned on the fixing of Y = y, there exists a subset S CT with |S| = h such that 
the joint distribution of the rows of Z with index in S is e\ far from uniform, where ei 
is an error parameter to be chosen later. 

Note that 5CT, since y satisfies condition a), we must have that conditioned on the fixing of 
Y = y, each row of W with index in S is 2 _f L^-close to uniform. Therefore, for each S C [N] with 
|S| = h we now define an event Bads to be the set of y’s in Supp(F) that satisfies the following 
two properties: 

c) Conditioned on the fixing of Y = y, each row of W with index in S is 2 _f ^-close to 
uniform, 

and 

d) Conditioned on the fixing of Y = y, the joint distribution of the rows of Z with index 
in S is ei far from uniform. 

Thus every bad y must belong to some Bads ■ Therefore to bound the probability mass of the 
bad y’s we only need to bound Pr[.Bads] for every S and then take a union bound. Now the crucial 
observation is that for any fixed subset S, property c) is determined by the h random variables 
Ri = Exti(y,rj) with i £ S. Let R be the concatenation of {Ri,i £ S} (which is a deterministic 
function of Y), and define the event Ag to be the set of r’s in Supp(i?) that makes property c) 
satisfied, then we have Pr[Bads] = X^reA b - P r [-^ = r \ Pr[Bads\R = r\. 

Now another crucial observation is that the size of R is small. Indeed, it is bounded by hi = 
k a+l3 . If we choose a, (3 to be such that a + (3 < 1, then the size of R is o(k) and we can argue 
that with probability 1 — 2~^ over the fixing of R = r, we have that Y still has min-entropy at least 
k — o{k ) — l = k — o(k ) > 0.9 k. Moreover condition on the fixing of R = r we have that {Wi , i £ S} 
is a deterministic function of X , and is thus independent of Y. 

We now bound Pi[Bads\R = r] in two cases. First, if H 00 (Y\R = r) < 0.9 k, we will just use 
Pi[Bads\R = r] < 1. By the above argument this happens with probability at most 2 . We now 

consider the case where H^iYlR = r) > 0.9 k. In this case, we know that for all i & S, W t is 
2 - ^)-close to uniform. Thus the joint distribution of {Wi,i € S} is h2~^^ = 2 _ ^^-close (since 
h = k a and l = k@) to a source with h. truly uniform rows. Ignoring the error for the moment, we 
can now say that for all i € S. \(Yi,Wi) — (Ue,Wi)\ < Thus for all i € S, with probability 

1 — 2~ n ^ over the fixing of Wi, we have that Y % is 2 -f ^-close to uniform. This implies that with 
probability 1 — h2~^^ = 1 — over the fixing of { W , t , i € S'}, we have that the joint distribution 

of { Yi,i € S} is h2 ~= 2 -f ^)-close to a source with h truly uniform rows. Moreover, notice 
that the size of { Wi , i € S} is also bounded by hi = k a+ ^. Thus again we can argue that with 
probability 1 — 2~^ over the fixing of {11}, i £ S}, we have that X still has min-entropy at least 
k — o(k ) — l > 0.9 k. Altogether, this implies that with probability 1 — 2 _n ^ — 2 -£ = 1 — 2~~ n ^ 
over the fixing of {IT}, i € S}, we have that the joint distribution of {Y,i € S} is 2^ r2 ^^-close to a 
source with h truly uniform rows, and X still has min-entropy at least 0.9A:. In addition, after this 
further fixing of {IT,, i € S}, we have that {Y,i € S} is a deterministic function of Y, and is thus 
independent of X. 


9 


We can now use the same argument in [Lil3a] (treat {Y.i G 5} as the SR-source and X as an 
independent weak source) to argue that with probability 1 — 2 ~over the fixing of {Y,i € 5} 
(and thus also the fixing of Y, since {Yj, i € S} is now a deterministic function of Y), we have that 
the joint distribution of {Zi,i G 5} is 2 ~ n ^-close to uniform. Now adding back all the errors, the 
above statement is still true (except for a slight change of constants in f2(-)). Thus, if we set e\ to 
be some appropriately, then we have that in this case Pr[Bads\R = r\ < 2 ~^'. Therefore, 

by combining the two cases, we get that Pr [Bads] < 2~ l + Pr[As]2 _a ^) < 2 _f2 ^. 

Now by the union bound we know the probability mass of the bad y’s is at most (^)2 _f2 ^ < 
N h 2~^) = If we choose a, (3 such that k^~ a > Clogn for some large enough 

constant C > 1, then we get that this probability mass is again 2~^\ Also, by choosing the 
constant C appropriately, this will also ensure that the error of the fi-wise independent rows (which 
is 2 - ^)) is less than N~ eh . This will be enough for the lightest bin protocol to work, as shown in 
[Lil3a] . All these requirements, as well as other requirements in obtaining the h -wise independent 
SR-source, can be satisfied as long as k = log 2+ri n for any constant rj > 0 (see Algorithm 5.13). 

Now we are done. Subtracting the probability mass of the bad y’s from 1 — 2 _fc / 2 , we get that 
with probability 1 — 2~ n ^ over the fixing of Y, the source Z has the desired h -wise independent 
property. 

2.2.3 Achieving a three-source extractor 

Now that we have used two independent sources to obtain an SR-source with the h -wise independent 
property, we can use the rest of the construction in [Lil3a] to get an extractor. However, the direct 
use of the construction in [Lil3a] requires at least two more sources. This is because the lightest 
bin protocol requires at least one round, and at the end of that round we need to use a fresh source 
to get another SR-source. We then need to take another source in order to finish extraction. This 
will give us a four-source extractor. 

In order to save one source, we observe that if the entropy & is a large enough polynomial in 
log n, then h = k a will also be large enough so that in just one iteration of the lightest bin protocol, 
the number of rows in the SR-source will decrease from N = poly(n) to say N' < k 1 ^ 3 . We let 
the concatenation of these rows of Z be Z'. Note that Z' is a deterministic function of Z. By 
cutting the length of each row of Z (if necessary) to say y/k, we see that the size of Z' is bounded 
by N'y/k < fc 5 / 6 . At the end of the lightest bin protocol we will take a fresh weak source Y? (this 
is the third source) and use each row of Z' to extract a string of length say 0.9/c from I 2 (by using 
an optimal seeded extractor). We let the concatenation of these outputs be Y'. The analysis in 
[Li 13a] implies that with high probability over the fixing of Z, the new source Y' is also (close to) 
an SR-source (here it is not necessary to have the h-wise independent property). 

Note that Y' is a deterministic function of I 2 and Z', and Z' is deterministic function of Z. 
Moreover conditioned on the fixing of Y, we have that Z is a deterministic function of X. Thus 
it is also true that with high probability over the fixing of Z', the new source Y' is close to an 
SR-source. Since the size of Z' is o(k), we can argue that with high probability over the fixing of 
Z ', the min-entropy of X is k — o[k ) > 0.9/c. Moreover conditioned on the fixing of (Y,Z'), we 
have that X and Y' are independent. Note that Y' is an SR-source with fc 1 / 3 rows but each row 
has length 0.9 k k 1//3 , thus by using an extractor from [BRSW06] we can extract random bits 
from X and Y' which are 2 _fcn(1) -close to uniform. This gives our three-source extractor with error 
2 ~fc n(1) _ it turns ou t that it is enough to choose k > log 12 n and a = 1/6,/? = 1/3 in this case. Also 
notice here that Y and Y 2 need not be independent, but rather it suffices to have (Y, Y 2 ) be a block 
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source (since the analysis first conditions on the fixing of Y ). Thus our construction actually gives 
an extractor for one (n, k) source and another independent (k, A;)-block source (see Algorithm 5.9). 

2.2.4 Improving the results of [Lil3a] for smaller min-entropy 

Our three-source extractor requires k > log 12 n. However, if k = log 2+r? n for some small constant 
rj > 0 , then we can also get improved results by replacing the step of obtaining the h -wise inde¬ 
pendent SR-source in [Lil3a] with our new construction, which uses only two independent sources. 
This way we get a constant-source extractor with error 2 _fcQ(1) . 

Moreover, once we have this SR-source, running the lightest bin protocol actually does not 
need fully independent sources. For example, if X = (A l5 • • • ,X t ) and Y = (Yi, • • • , Y t ) are two 
independent block sources where each block has min-entropy k conditioned on all previous blocks, 
then we can first obtain the SR-source Z from (Xf .Y\). Now we know that with high probability 
conditioned on the fixing of Y\ , the source Z has the desired property; moreover it is a deterministic 
function of X. Thus we can run the lightest bin protocol once and take a new block from Y to 
obtain a new SR-source Z 2 , which is a deterministic function of Y conditioned on Z: we can then 
run the lightest bin protocol again and take a new block from X to obtain a new SR-source Z 3 , 
which is a deterministic function of X conditioned on Z 2 , and so on. This gives us an extractor for 
two independent block sources with each having a constant number of blocks (see Algorithm 5.13). 

Organization. The rest of the paper is organized as follows. We give some preliminaries in 
Section 3. In Section 4 we define alternating extraction, an important ingredient in our construc¬ 
tion. We present our main construction of extractors in Section 5. Finally we conclude with some 
open problems in Section 6 . 

3 Preliminaries 

We often use capital letters for random variables and corresponding small letters for their instanti¬ 
ations. Let |5| denote the cardinality of the set S. For l a positive integer, U{ denotes the uniform 
distribution on {0,1}^. When used as a component in a vector, each Up is assumed independent of 
the other components. All logarithms are to the base 2. 

3.1 Probability distributions 

Definition 3.1 (statistical distance). Let W and Z be two distributions on a set S. Their statistical 
distance (variation distance) is 

A(W, Z) = f max(| W(T) - Z(T )|) = ± £ |W( S ) - Z(s )|. 

seS 

We say W is e-close to Z, denoted W ~ e Z. if A (W, Z) < e. For a distribution Dona set S 
and a function h : S —> T, let h(D) denote the distribution on T induced by choosing x according 
to D and outputting h(x). 
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3.2 Somewhere Random Sources and Extractors 

Definition 3.2 (Somewhere Random sources). A source X = (Xi, ■ ■ ■ , Xf) is (t x r) somewhere- 
random (SR-source for short) if each X j takes values in {0, l} r and there is an i such that X j is 
uniformly distributed. 

Definition 3.3. (Block Sources) A distribution X = X\ o X 2 o • • • , 0 X 4 is called a (&q, & 2 , ■ ■ • ,kt) 
block source if for all i = ,t, we have that for all x\ € Supp(Xi),--- 1 € Supp(Xj_i), 

Hoa{Xj\X\ = xi, ■ ■ ■ , 1 = Xi- 1 ) > ki, i.e., each block has high min-entropy even conditioned on 

any fixing of the previous blocks. If k\ = k 2 = • • • = kt = k, we say that X is a k block source. 

3.3 Prerequisites from previous work 

For a strong seeded extractor with optimal parameters, we use the following extractor constructed 
in [GUV09]. 

Theorem 3.4 ([GUV09]). For every constant a> 0, and all positive integers n,k and any e > 0, 
there is an explicit construction of a strong (k,e)-extractor Ext : {0, l} n x {0, \} d —>• {0, l} m with 
d = 0(logn + log(l/e)) and m > (1 — a)k. 

Theorem 3.5 ([BRSW06]). For every n,k(n) with k > log 2 n, and any constants 0 < r] < 1, 
0 < 7 < 1/2 such that k 1 “ 2 ' y > log L1 n ; there exist constants 0 < a,/3 < 1 and a polynomial time 
computable function BasicExt : {0, l} n x {0, l} fc7+1 —> {0, l} m s.t. if X is an (n, k) source and Y is 
a (AR x k) (k - k 0 )-SR -source, 


|(y, BasicExt(X, Y)) — (Y, U m )\ < e 

and 

\(X, BasicExt(X,T)) - (X, U m )\ < e 

where U m is independent of X,Y, m = (1 — rj)k and e = 

Remark 3.6. The original version of [BRSW06] requires k > log 10 n. But this is only because 
the output length is m = k — k^^\ and to achieve such output length, currently the best known 
seeded extractor requires seed length d = 0( log 3 (n/e)). If we only need to achieve output length 
717. = (1 — rj)k, then we can use a seeded extractor with seed length d = 0(log(n/e)), such as 
[GUV09]. Then it suffices to have k > log 2 n for some properly chosen a, 13. 

The following standard lemma about conditional min-entropy is implicit in [NZ96] and explicit 
in [MW97]. 

Lemma 3.7 ([MW97]). Let X and Y be random variables and let y denote the range ofY. Then 
for all e > 0, one has 


Pr 

Y 


Hoo(X\Y = y) > Ff 00 (X) — log |T| 



> 1 - e. 


We also need the following lemma. 

Lemma 3.8 ( [Lil5] ). Let (X, Y) be a joint distribution such that X has range X and Y has 
range y. Assume that there is another random variable X' with the same range as X such that 
\X — X'\ = e. Then there exists a joint distribution (X', Y) such that |(X, Y) — (X', T)| = e. 
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4 Alternating Extraction 


As in [Lil3a], an important ingredient in the construction of our extractors is the following alter¬ 
nating extraction protocol. 

Quentin: Q , S\ Wendy: X 


51 

5 2 = Ext q (Q,R 1 ) 


Si 

-> 


< --- Ei = Ext* (X,S i) 

s 2 

- > 

< --- R 2 = Extwpf, S 2 ) 


St = Extq(Q, Rt-i) 


s t 

- > 

Rt = ExU(A,S t ) 


Figure 1: Alternating Extraction. 


Alternating Extraction. Assume that we have two parties, Quentin and Wendy. Quentin 
has a source Q, Wendy has a source X. Also assume that Quentin has a uniform random seed Si 
(which may be correlated with Q). Suppose that (Q, S i) is kept secret from Wendy and X is kept 
secret from Quentin. Let Ext g , Ext,,, be strong seeded extractors with optimal parameters, such as 
that in Theorem 3.4. Let £ be an integer parameter for the protocol. For some integer parameter 
t > 0, the alternating extraction protocol is an interactive process between Quentin and Wendy 
that runs in t steps. 

In the first step, Quentin sends Si to Wendy, Wendy computes R± = Ext w (A, Si). She sends 
R\ to Quentin and Quentin computes S 2 = Ext q (Q, EQ. In this step Ri,S 2 each outputs £ bits. 
In each subsequent step i, Quentin sends S,; to Wendy, Wendy computes Ri = Ext,,, (A, S,;). She 
replies Ri to Quentin and Quentin computes Sj + i = Ext q (Q,Ri). In step i, Ej,Sj + i each outputs 
£ bits. Therefore, this process produces the following sequence: 


Si, Ei = Ext lu (A, Si), S 2 = Ext g (<2, Ei), ■ ■ ■ ,S t = Ext 9 (Q,E*_i),E* = Ext w (X,S t ). 

Look-Ahead Extractor. Now we can define our look-ahead extractor. Let Y = ( Q . S\) be a 
seed, the look-ahead extractor is defined as 

laExt(A, Y) = la Ext (A, (Q, Si)) d = Ei,--- ,R t . 

The following lemma is proved in [Lil3a]. 

Lemma 4.1. Let Y = ( Q , Si) where Q is an (n q , k q ) source and S\ is the uniform distribution over 
£ bits. Let Y 2 = (Q 2 ,S 2 i),--- , W = (Qh,Shi) be another h — 1 random variables with the same 
range of Y that are arbitrarily correlated to Y. Assume that X is an ( n,k ) source independent of 
{Y, Y 2 . - ■ ■ ,Yh), such that k > ht£+ 1IM+2 log(l/e) and k q > ht£+10£+2 log(l/e). Assume that Ext g 
and Ext u , are strong seeded extractors that use £ bits to extract from (n q , 10Q sources and (n, 10^) 
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sources respectively, with error e and £ = 0(\.og(max{n q , n}) + log(l/e)). Let (Ri,--- ,Rt ) = 
laExt (X,Y) and (Rn,--- ,Ru) = laExt (X,Y{) fori = 2, • • • ,h. Then for any 0 < j < t — 1, we 
have 


(Y, Yj r ■ ■ ,Y h ,{Ri i,--- ,Rij,i = 2,--- ,h},Rj +1 ) « ei (Y,Y 2 ,--- , Y h , {Rn, ■ ■ ■ ,Rij,i = 2, • • • ,h},Ue), 


where e\ = 0(te). 


5 The Extractor 

In this section we give our main construction. We will take two parameters 0 < a < f3 < 1 and let 
h ~ k a and £ = k@. The first step is to obtain an SR-source such that a large fraction of the rows 
are roughly h -wise independent. We have the following claim and lemma. 

Claim 5.1. Let Ext : {0, l} n x {0, \} d —> {0, l} m be a ( k , e) seeded extractor. For any T C {0, l} m 
and p = |T|/2 m , /et Badx = {x € {0, l} n : Pr r< _[/ d [Ext(x, r) € T] > p + e}. T/ien 

|-Bad-r| < 2 fc . 

Proof. Suppose not, then there exists a T C {0, l} m and p = |T|/2 m such that |Badr| > 2 fc . Now 
let X be the uniform distribution over the set Badr, and we have that X is an (n, k ) source. Let R 
be the uniform distribution over {0, l} rf . Then for any x € Supp(X), we have that Pr[Ext(x, R) G 
T] > p + e. However this implies that 


|Ext(X, R) - U m | > |Pr[Ext(X, R) G T] - Pr[B m G T]| 

= ^ Pr[X = x\ Pr[Ext(x, R) G T] — p 

xgSupp(X) 

> |p + e~ P\ = e, 

which contradicts the fact that Ext is a (k, e) seeded extractor. □ 

Lemma 5.2. Let Exti : {0, l} n x {0, l} d —> {0, l} m be a (LiBi) seeded extractor, and Ext 2 : 
{0, l} n x {0, l} m —> {0, l}” 12 be a (k 2 ,e 2 ) strong seeded extractor. Let Y be an (n,2ki) source and 
X be an independent ( n,k 2 ) source. For i = 0,1, - - - , 2 rf — 1, let Zi = Ext 2 (-X’, Exti(Y, rff), where 
ri is the d bit string of i’s binary expression. Then with probability 1 — 2~ kl over the fixing of Y, 
there exists a subset S C {0,1, • • • ,2^ — 1} such that the following holds: 

• I#| > (1 — y/ef — e\)2 d . 

• Vi G S, we have \Zi — B m2 | < y/ef. 

Proof. Let R be a uniform random string over {0, l} m . Since Ext 2 is a (k 2 ,e 2 ) strong seeded 
extractor, we have 


Pr [|Ext 2 pf,r)-t/ m2 | 

r-^R 


> V^2] — 2- 
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Let Badx = {r € {0, l} m : |Ext 2 (A, r) — U m2 \ > then \Badx\ < \f£ I2 m . Now let R' be the 

uniform distribution over {0, l} d and let Bady = {y € {0, l} n : Pr[Exti(y, R') € .Badx] > y'eid-ei}. 
Then by Claim 5.1 we have that 


\Bad y \ < 2 k \ 

Thus if Y is an (n,2fci) source, then Pr y< _y[y € Bady} < 2 kl 2~ 2kl = 2~ kl . When y £ Body , 
we have that Pr[Exti(y, R') € Badx ] < yT-J + ei, which implies that there exists a subset S C 
{0,1, • • • ,2 d - 1} with |S| > (1- ^-ei)2 d and Vi € S, \Z t - U m2 \ = |Ext 2 (A, Exti(y, r*)) - U m2 \ < 

v^2- n 

Suppose we have an (n, k) source X with k > polylog(n) and an independent SR-source Y = 
Y 1 o • • • o Y n with N = poly(n) rows and each row has 0.9 k bits, such that a large fraction of 
the rows are uniform. The following algorithm from [Lil3a] takes X and Y as inputs and outputs 
another SR-source Z such that a large fraction of the rows are roughly h -wise independent. 


Algorithm 5.3 (SSR(X, Y) [Lil3a]). 

Input: X — an (n, fc)-source with k > polylog(n). Y = Y 1 o • • • o Y N —an SR-source with 
N = poly(n) rows and each row has 0.9A; bits, independent of X. 

Output: Z — a source that is close to an SR-source. 

Sub-Routines and Parameters: 

Let 0<a</3<lbe the two constants above. Let i = k@. Pick an integer h such that 
k a < h < 2k a and h = 2 l for some integer l > 0. Let Ext f/ , Ext„, be strong extractors with 
optimal parameters from Theorem 3.4, set up to extract from ((/i 2 + 12)^, KM) sources and 
(n, 1(M) sources respectively, with seed length £, error e 2 = 2 _J2 ^ and output length £. These 
will be used in la Ext. Let Ext be a strong extractor with optimal parameters from Theorem 3.4, 
set up to extract from (0.9/c, 2(/i 2 + 12)£) sources, with seed length £, error e 2 = 2 anc i 
output length (h 2 + 12)£. 

1. For every i = 1, • • • , N, use X and Y l to compute Z l as follows. 

(a) Compute the binary expression of i — 1, which consists of d = log N = O(logn) bits. 
Divide these bits sequentially from left to right into b = [ j] blocks of size l (the last 
block may have less than l bits, then we add 0s at the end to make it l bits). Now 
from left to right, for each block j = 1, • • • , b, we obtain an integer Indy < 2 l such 
that the binary expression of Indy — 1 is the same as the bits in block j. 

(b) Let Y l1 be the first (h + 12)^ bits of Y l . Set j = 1. While j < b do the following. 

i. Compute (R\ 2 , • • • , R%) = laExt(X, Y 1 ^), where Q = W 7 and Si is the first £ bits 
of Y l K 

ii. Compute F 1 (j+') = Ext(W, R\i d . .)■ 

' l 3 

iii. Set j = j + 1. 

(c) Finally, compute (R\ b , • • • , R lb ) = la Ext (A, Y lb ) and set Z l = ; . 

2. Let Z = Z l o ■ ■ ■ o Z N . 
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We now introduce some notation as in [Lil3a] . For any i G [N] and j € [6], we let denote 

(Y* 1 , ■ ■ ■ , Y* J ), let /?,|^^ ^ denote (i?,^ d ;i . ■ • • ,R\ J nd .) and let f 3 (z) denote the integer whose binary 
expression is the concatenation of the binary expression of i — 1 from block 1 to block j. The 
following lemma is proved in [Lil3a] . 

Lemma 5.4. Assume that k > 2 [hh + 2 ){h 2 + \2)i. Fix any v € [N] such that Y v is uniform. Let 
S C [N] be any subset with |5| = h and v € S. For any j G [6], define Si = {i G S : f 3 (i ) < f 3 (v)}. 
Then for any j G [b], we have that 


(. R vj 






i{<3~ 1) 
' nt h(<J-l) 


,i G 5}). 


Moreover, conditioned on the fixing of {{Y l ^- 3 \i G S'}, , z G S}), we have that 


1. X and Y are still independent. 

2. (7?,!T ,i G S ) are all deterministic functions of X. 

' b 3 

Now we can prove the following lemma, which is slightly stronger than a similar lemma in 
[Lil3a]. 

Lemma 5.5. Assume that k > 2 {bh + 2)(/i 2 + 12)^, X is an (n,k)-source and Y is an N x 0.9 k 
SR-source independent of X, with N = poly(n) such that there exists a subset S C [N] and for 
any i G S, Y l is uniform. Let Z = Z 1 o • • • o Z N = SSR(Y, Y). Then for any subset S' C S with 
| S'\ = h, we have that 


(( Z i ,i(=S'),Y) « e (U hi ,Y), 

where e = 0(bh 2 e 2 ) = 2~ n W. 

Proof. We order the elements in S' to be i\ < i 2 < ■ ■ ■ < ih- Since S' C S, for any j G [h] we have 
that Y l J is uniform. We now apply Lemma 5.4 to the set S' with j = b. Note that f b (i ) = i — 1, 
thus for any v G S' we have S ,b = {i G S' : i < v}. Also note that Z l = R lb nA for any i G [N], 
Thus by Lemma 5.4, for any j G [h] we have that 


(Z^ZV-- , Z ij ~ 1 , {Y^- b \i G 5 / },{i?i ( - 6_1) 


iO(jhe 2 ) (U t , z^ , • • • , Z ’- 1 , (W^ 6 ), z G S’ / }, {i? 


Ind i(<6-1) 

,i(<6—1) 
lnd i( < 6 _ 1) ’ 


* € 5'}), 


where £2 = 2 

Note that by Lemma 5.4, conditioned on the fixing of {Y* ( -^,z G 5"},,z G 5'}, we 

have that X and Y are still independent, and (i?.^ d h ,i G S') = (Z\i G 5') are all deterministic 
functions of X. Thus we also have 
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(z l ^z n ,--- ,z^-\{Y^ b \ie s'},{R i( - b ~ 1) 


Ind 


i(<b- 1) 


* 0 ( jh e 2 )(Ut, z*\ • ■ •, z^-\{Y^ b \i € s'}, {r; ( - 6 - 1} 


L lnd 


i(<6—1) 7 


,i£S'},Y) 
i£S'},Y ), 


and therefore (since / < 6) 




,Z l i 


* 7-1 


- 0 ( 6 ^ 2 ) ,Z^~\Y). 


Note this holds for every j, thus by a standard hybrid argument we have that 


(zv-- ,z\y) « e (t/ M ,n 

where e = 0(bh?e 2 ) = Otfih 2 2~ n ^) = 2 _f ^ since ^ = k@, h < 2k a and 6 < logn = k°^ l \ □ 

We can now describe the algorithm to create an SR-source such that a large fraction of the rows 
are roughly h- wise independent, from just two independent sources X and Y. 


Algorithm 5.6 (SR(X, Y)). 

Input: X, Y — two independent (n, 2fc)-source with k > polylog(n). 

Output: Z — a source that is close to an SR-source. 

Sub-Routines and Parameters: 

Let 0<a</3<lbe the two constants defined before. Let £ = k 13 . Let Exti, Ext 2 be two strong 
seeded extractors with optimal parameters from Theorem 3.4, set up to extract from (n, k ) 
sources. Exti has seed length d = O(logn), error ei = 1/4 and output length t\ Ext 2 has seed 
length £, error £2 = and output length £. Let Ext 3 be another strong seeded extractor with 

optimal parameters from Theorem 3.4, set up to extract from (n, k) sources, with seed length £, 
error £2 and output length 0.9 k (we will choose the parameters such that 2k — {h + 1)1 > k). 

1. Let N = 2 d = poly(n). For every i = 1, • • ■ , N, let r* be the d bit string which is the 
binary expression of i — 1. Compute W t = Ext 2 (X, Exti(Y, rf)) and Y l = Ext 3 (Y, Wf). Let 
Y = Y 1 o • • • o Y N . 

2. Compute Z = SSR(A, Y) using Algorithm 5.3. 


We now have the following lemma. 

Lemma 5.7. Assume that k > 2(bh + 2 )(/i 2 + 12)1. There exists a constant C > 1 such that if 
£ > Chlogn, then with probability 1 — 2~^^ over the fixing ofY, the following property is satisfied: 
there exists a subset T C [ N ] such that |T| > |lV and\/S C T with |Sj = h, we have 

\(Z i ,ieS)-U M \<2~ Q W. 
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Proof. Let W = W\ o ■ ■ ■ o Wn- We first show that with high probability over the fixing of Y , we 
have that W is an SR-source with a large fraction of close to uniform rows. This follows directly 
from Lemma 5.2. Specifically, the lemma implies that with probability 1 — 2~ k over the fixing of 
Y, there exists a subset T C [. N ] with N = 2 d = poly(n) such that |T| > (1 — yftf — 1/4 )N > |N 
since 62 = 2 _n W; and Mi € T, we have | W t — U(\ < yfef = 2~ Q ^. 

Now consider any y € Supp(F) which makes the above happen. We’d like to show that condi¬ 
tioned on this Y = y, in the final output Z, the same set T of the rows will also have the property 
of being roughly h- wise independent. However, this may not be the case; and if not, we will call 
such a y bad. Now fix any bad y. Then we know that there must be a subset S C T with \S\ = h 
such that | (Z l ,i € S) — Um\ > e' for some e' = 2 _r2 ^. At the same time, since ScTwe also know 
that Mi € S, we have | IF,; — Ue | < y/ei = 2~^\ Let 

Bad s = {y G Supp(y) :Mie S, \W t - U e \ < yfef but \(Z\i € S) - U u \ > e'} 

for some e' = 2 _f2 W. Then we must have y € Bads■ Therefore, any bad y must be in 
By the union bound we know 


Pr [y is bad] < Pr [Bads ■ 

y-^-Y 

Thus to bound the probability of a bad y we only need to bound Pr [Bads\- 
Now fix any subset S C [N] with l^l = h. Let R = {Exti (Y,n),i € S'}. We now bound 
Pi '[Bads] as follows. Define 

A s = {r e Supp(A) : Vi G S, \Wi - U f: \ < yfef}. 

Then 


Pi/Hads] = Pr [R = r] Pr[Bads\R = r]. 

r&Ag 

We now estimate Pr[Bads\R = r\. First we know that conditioned on any R = r, we have that 
Mi € S, |Wj — U(\ < yfef- Thus by Lemma 3.8 we can get rid of the error one by one for each * € S 
and we have that there exists another random variable (IF/, i G S) such that Mi G S, W[ = Ua and 
| (Wi,i € S) — (IF/, i € S)| < hy/ef. From now on we’ll think of (W l: i € S) as being (IF/, i G S) 
(i.e., every row is truly uniform). This only adds hyfef to the final error. Now, since the size of R 
is bounded by hi. by Lemma 3.7 we have that 

Pr [H 00 (Y\R = r) > 2k - hi - l > k\ > 1 - 2~ e . 

r<—R 

Now we have the following two cases. 

Case 1: H ao (Y\R = r) < k. In this case we’ll just bound Pr[Bads\R = r] by Pr[Bads\R = 
r] < 1. However, the probability of such R = r is at most 2 . 

Case 2: H^iYlR = r) > k. In this case, we know that Mi € S, IF* is uniform and independent 
of Y (since it is a deterministic function of X conditioned on the fixing of R = r ). Thus by 
Theorem 3.4 we have that 


|(y i ,Wi)-(Z7o.9fc,Wi)|<e 2 . 
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Therefore Vz G S, we have that with probability 1 — yjoi over the hxing of W % , W is yT-J-close 
to uniform. Let W = {W t , i G S}. Then with probability 1 — hy/e 2 over the fixing of W, we have 
that each Y l is yTjj'-close to uniform. Thus again by Lemma 3.8, we have that Y s = {Y*,z € S } 
is hy/oi -close to another source Y' s = {Y'\i G S} where \/i,Y n = [/o.gfc- Now since the size of W 
is h£, again by Lemma 3.7 we have that with probability 1 — 2~^ over the fixing of W, X still has 
min-entropy at least k. Thus, in summary, with probability 1 — hy/e?, — 2 _£ over the hxing of W, 
we have that X has min-entropy at least k, Y s = {Y*,z G 5} is \iyfI 2 -close to Y' s = {Y n ,i G S}, 
and X and Y s are independent (since W is a deterministic function of X). Assume for now that 
Y s is just Y' s , then we can apply Lemma 5.5 to conclude that in this case, we have 

e S),Y) - (U he ,Y)\ <0(bh 2 e 2 ). 

Therefore with probability 1 — 0(bhy/e2) over the hxing of Y, we have that G 5) — Um\ < 

hy/ 02 . Now adding back all the errors, we get that with probability 1 — Oibhy/ei) — hy/ei = 
1 — Oibhy/ei) over the hxing of Y, we have that 

\(Z i , i G S) — Um\ ^ hy/e 2 + hy/&2 + hy /02 + 2 < ( 3 h + l)y/e 2 . 

Now let e' = (3 h + 1 )y/e 2 = 2 _r2 ^^ since £2 = 2~^\ l = and h < 2 k a . We have that in Case 

2 , 


Pr[Bads\R = r] < 0{bhyfe2). 
Therefore for any hxed S, we have that 

Pr [Bads] < 2" £ + Pr [A s \0{bhy/T 2 ) = 0{bhyfc) = 

since b < logn = k °^ and h < 2k a . 

Thus 


Pr [y is bad] < (^^2“°^ < N h = 2~^ t)+0(hlogn) = 2~ n ^, 

y^-Y \h J 

if we choose h,£ such that £ > Chlogn for some sufficiently large constant C > 1. 

Now subtracting the probability mass of the bad y’s, we get that with probability 1 — 2~ k — 
= 1 — over the hxing of Y, there exists a subset T C [N] such that |Tj > |iV and 

V5 C T with |Sj = h, we have 

\(Zi,ieS)-U M \<e' = 2~W\ 


□ 


Next we describe the lightest bin protocol, dehned in [Lil3b] . 

Lightest bin protocol: Assume there are N strings {z l ,i G [N]} where each Zj G {0, l} m 
with m > log AT. The output of a lightest bin protocol with r < N bins is a subset T C [N] that is 
obtained as follows. Imagine that each string z 1 is associated with a player Pi. Now, for each i, Pi 
uses the hrst logr bits of z,; to select a bin j, i.e., if the hrst logr bits of z* is the binary expression 
of j — 1, then Pi selects bin j. Now let bin l be the bin that is selected by the fewest number of 
players. Then 
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T = {i G [N] : Pi selects bin /.} 

The following lemma is proved in [Lil3a]. 

Lemma 5.8. For every constant 0 < 7 < 1 there exists a constant C\ > 1 such that the following 
holds. For any n,k,m,N G N, any even integer h > C\ and any e > 0 with N > h?, e < N~ eh , 
k > 20/i(log n + log(l/e)) and m > 10(logn + log(l/e)), 2 3 assume that we have N sources {Z\,i G 
[N]} over m hits and a subset S C [N] with |£| > 5N for some constant 5 > 1/2, such that for any 
S' C S with | S' | = h, we have 


(Z[,i e S') * e U hm . 

Let Z\ = Z\ o ■■■ o Z^ . Use Z\ to run the lightest bin protocol with r = ZK bi ns 

3 and let the output contain N 2 elements Un 2 € [N]}. Assume that X is an ( n,k ) 

source independent of Z\. For any j G [A^], let Z ] 2 = Ext (X,Z l f) where Ext is the strong seeded 
extractor in theorem 3.f that has seed length m and outputs m 2 = k/{2h) bits with error e. Then 
with probability at least 1 — N~^l 2 over the fixing of Z\, there exists a subset S 2 C [-/V 2 ] with 
IS2I > 5(1 — 7 )N/r > 5(1 — 7)^2 such that for any S' 2 C S 2 with | S'gl = h, we have 

(Z 2 ,i G S') « e2 U hm2 

with £2 < N 2 Gh and m 2 > 10(logn + log(l/e 2 )). 

We can now present our construction of extractors for independent sources. 


2 The constants actually depend on the hidden constant in the seed length d = 0(log(n/e)) of an optimal seeded 
extractor. Nevertheless they are always constants and don’t really affect our analysis. For simplicity and clarity we 
use 20, 10 here. 

3 For simplicity, we assume that r is a power of 2. If not, we can always replace it with a power of 2 that is at 
most 2 r. This does not affect our analysis. 
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Algorithm 5.9 (Independent Source Extractor lExt). 

Input: X — an (ro, 2/c)-source with k > -7 log 12 n. Y = (Yj, Y 2 ) — a (2 k, 2k) block source where 
each block has n bits, independent of X. 

Output: V — a random variable close to uniform. 

Sub-Routines and Parameters: 

Let SR be the function in Algorithm 5.6. Let BasicExt be the extractor in Theorem 3.5. Let Ext 
be the strong extractor in Theorem 3.4. Let 0<o</3<lbe the two constants defined before. 
Let 0 < 7 < 1 be the constant in Lemma 5.8. We will choose a = 1/6,/? = 1/3 and 7 = 1/4. 
Let h,£ be the two parameters in Algorithm 5.3 with k a < h < 2k a and £ = &r. 

1. Compute Z = Z 1 o ■ ■ ■ o Z A = SR(X, Y\). 

2. Let N = poly(n) be the number of rows in Z. Run the lightest bin protocol with Z and 

2 1 _2_ 

r = jQfiN v"?r bins and let the output contain N\ elements , *jvi € [ N ]}. Let 

Z\ = Z\ o ■ ■ ■ o Z^ 1 be the concatenation of the corresponding rows in Z (i.e., Z\ = Z lj '). 

3. Note that N\ < [N/r\. Without loss of generality assume that N\ = [N/r\. If not, add 
rows of all 0 strings to Z\ until N\ = [N/r J. 

4. For any j € [Ai], compute Zg = Ext(T 2 )Z 2 ) and output m 2 = Vk bits. Let Z 2 = 
Z,\o---o Z 2 Yl . 

5. For any j € [N\], compute Z 3 = Ext (A, Z ] 2 ) and output m 3 = 1.9A; bits. Let Z 3 = 
Z 3 1 o • • • o Zf 1 . 

6 . Compute V = BasicExt(l 2 ) Z 3 ). 


We now have the following theorem. 

Theorem 5.10. There exists a constant Cq > 1 such that for any n,fe G N with n > Cq and 
k > - log 12 n, if X is an ( n,2k)-source and Y = (Y \, Y 2 ) is an independent (2k, 2k) block source 
where each block has n bits, then 


\(\Ext(X,Y),Y)-(U m ,Y)\<e 

and 

\(\Ext(X,Y),X)-(U m ,X)\ <e, 

where m = 1.8k and e = 2 -fcn(1) . 

Proof. By Lemma 5.7, with probability 1 — 2 _t2 ^ over the fixing of Y\, there exists a subset T C [ N ] 
such that \T\ > |N and VS 1 C T with l^l = h, we have 

\(Z\ieS)-U M | <2" q W. 
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We now want to apply Lemma 5.8. But first let’s check that the conditions of Lemma 5.7 and 
Lemma 5.8 are satisfied. Note that k a < h < 2k a , £ = and b < logn. To apply Lemma 5.7, 
we need that k > 2(bh + 2 ){h? + 12)£ and i > Chlogn for some sufficiently large constant C > 1. 
To apply Lemma 5.8, we need that e' < N~ 6h , k > 20h(logn + log(l/e')) and m = £ > 10(logn + 
log(l/e')). In Algorithm 5.6 we also need k > (h + 1)£. Altogether, it suffices to have 0 < a < (3 < 1 
satisfy the following conditions. 

k > 3log nh 3 £, £ > Chlogn, e' < N~ 6h and £ > 10(log n + log^/e')). 

These conditions are satisfied if the following conditions are satisfied. 

k > 24 k 3a+fi log n and £ = k? > Ck a log n 

for some constant C > 1. 

Now if ck = 1/6,/S = 1/3 and k > ^ log 12 n, then we see that for sufficiently large n, 

k bP 

- 3q+ :j = k 1 ' 6 > n(log 2 n) > 24logn and — = k 1 ' 6 > fI(log 2 n) > CTogn. 

Thus the above conditions are satisfied. 

Notice that m 2 = Vk < k/(2h), thus by Lemma 5.8 we have that with probability at least 
1 — N ~over the fixing of Z, there exists a subset S C [Ah] with |5| > 5(1 — 7 )N/r > | jN/r = 
\N/r such that for any S' C S with \S'\ = h, we have 

(Zlies') ^ 2 u hV - k 

with €2 < N^ 6h . 

Note that Z 2 is a deterministic function of Y 2 and Z\, and Z\ is a deterministic function of 
Z. Thus we also have that with probability at least 1 — over the fixing of Z\, the above 

property holds. Also note that N/r = ^tNYh > 16 h, so |5| > 8h > 1. Thus with probability at 

least 1 — N ~over the fixing of Z\, we have that Z 2 is N// eh < ( 8 h) _6/l -close to an SR source 
(since Ah > |5|). 

Note that conditioned on the fixing of Z\, we have that Z 2 is a deterministic function of Y 2 , and 

2 

is thus independent of X. Now note that N/r = . Since h > k a = k 1 / 6 and k > ^ log 12 n, 

we have that 

NXh < poly(n)° (1/logri) = 0 ( 1 ). 

Thus Ah < N/r = 0(h ) < A: 1 / 4 . Note that conditioned on the fixing of Y \, we have that Z\ is a 
deterministic function of X , with the size of Z\ bounded by k l ^£ < k 2 / 3 . Therefore by Lemma 3.7, 
we have that with probability 1 — 2 _0 ' 05fc over the fixing of Z\, X still has min-entropy at least 
2k - k 2 / 3 - 0.05k > 1.94k. 

Now since Z 2 is independent of X and assuming that Z 2 is indeed an SR-source, then by 
Theorem 3.4 we have that for some i € [Ah], 

\(Zi,Z* 2 )-(U 1 . 9k ,Z' 2 )\<2-^' rk \ 
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Thus with probability at least 1 — 2 over the fixing of Z\ (and thus also the fixing of 

Z 2 ), we have that Z 3 is 2 _ ^(^-close to an N\ x 1.9 k SR-source. Moreover, conditioned on the 
further fixing of Z 2 , we have that Z 3 is a deterministic function of X, and is thus independent 
of Y>. Furthermore, note the size of Z 2 is bounded by N\\fk < kf^y/k = k 3 / 4 . Thus again by 
Lemma 3.7, we have that with probability 1 — 2 _0,05fc over the fixing of Z 2 , Y 2 still has min-entropy 
at least 2k — A: 3 / 4 — 0.05A; > 1.94A;. 

Note that N\ < k 1//4 and A; 1 ” 2 / 4 = k 1//2 > log 1 ' 1 n, thus by Theorem 3.5, we have that 


\(V,Y 2 ) - (U m ,Y 2 )\ < €2 


and 


\(V,Z 3 )-(Um,Z 3 )\ <e 2 , 

where m = 1.8A; and £2 = 2 -fcntl) . Since we have already fixed Yi, Z\ and Z 2 , we have that Z 3 is 
a deterministic function of X. Thus conditioned on Z 3 . we have that V is a deterministic function 
of Y 2 , which is independent of X. Thus we also have that 

\(V,X)-(U m ,X)\<e 

and 


\(V,Y)-(U m ,Y)\<e, 


where by adding back all the errors we have 


Note that when n < Co, the extractor can be constructed in constant time just by exhaustive 
search (in fact, we can get a two-source extractor in this way). Thus, we have the following theorem 
(by replacing 2k with k). 

Theorem 5.11. For all n, k € N with k > log 12 re, there is an efficiently computable function 
lExt : {0, l} n x {0, l} 2n —»■ {0, l} m such that if X is an ( n,k)-source and Y = (Y\. Y 2 ) is an 
independent (fc, k) block source where each block has n bits, then 

\(\Ext(X,Y),Y)-(U m ,Y)\<e 

and 

|(IExt(X, Y),X) — (U m ,X)\ < e, 

where m = 0.9k and e = 2 _fcf2(1) . 5 

As a corollary, we immediately obtain the following theorem. 

4 One can show that in this case £2 is as well as all the other terms. So the entire error is 

5 The constant 0.9 can be replaced by any constant less than 1. 


23 



Theorem 5.12. For all n, k G N with k > log 12 n, there is an efficiently computable three-source 
extractor lExt : ({0, l} n ) 3 —> {0, l} m such that if X,Y,Z are three independent (n, k)-sources, then 

\\Ext(X,Y,Z)~U m \<e, 

where m = 0.9& and e = 2 _fcn(1) . 

If the entropy k gets very close to log 2 n, then we can use a similar construction as the extractor 
in [Lil3a], except replacing the step of creating the initial SR-source with the method in this paper. 
In this case we can get an extractor for two independent block sources each with a constant number 
of blocks of min-entropy k. We have the following algorithm. 


Algorithm 5.13 (Block Source Extractor BExt). 


Input: X = (Ai, X 2 , ■ ■ ■ Xf), Y = (Y\, Y 2 , ■ ■ ■ Yf) — two independent (2k,2k,-- 
sources where each block has n bits and k > \ log 2+r? n for any constant r\ > 0. 
Output: W — a random variable close to uniform. 

• , 2fc)-block 

Sub-Routines and Parameters: 



Let SR be the function in Algorithm 5.6. Let BasicExt be the extractor in Theorem 3.5. Let Ext 
be the strong extractor in Theorem 3.4. Let a = an d P = §(2+F) ’ w ^ ere h = be the 

two constants defined before, and 7 = 2L be another constant. Let h,£ be the two parameters 
in Algorithm 5.3 with k a < h < 2 k a and i = k 13 . 

1. Compute Z\ = Z\ o ■ ■ ■ o Z^ 1 2 3 4 = SR(A] . Y \) where N\ = poly(ro). Set the boolean indicator 
v y = 1 . 

2 . Set t = 1 . While Nt (the number of rows in Zf) is bigger than -^4- do the following: 

2 1_ 

(a) Run the lightest bin protocol with Z t and r t = j^N t v7r bins and let the output 
contain W+i elements {ii,i2,-" ,* N t+ i ^ Wt}}- 

(b) If v y = 1, take a fresh new block Y' from Y, and for any j € [A^+i], compute 
Zl +l = Ext (Y',Z l t j ) and output t < k/(2h) bits (note that we have k > 2 hi by our 
choices of a, (5). Set v y = 0. Otherwise, take a fresh new block X' from X, and for 
any j G [IVt+i], compute Z 3 t+1 = Ext(A', Z]f) and output t < k/(2h) bits. Set v y = 1. 

(c) Let Z t . 1-1 = Z£ +1 o • • • o zj^f 1 . Set t = t + 1. 

3. At the end of the above iteration we get a source Zt with Nt < rows. Without loss of 
generality assume that at this time v y = 0 (otherwise switch the roles of X and Y), and 
the last two blocks of X. Y used are X',Y'. For any j G [Nt], compute Z’i = Ext (X 1 , z{) 
and output m 2 = 1.9 k bits. Let Z' = Z' 1 o ■ ■ ■ o Z’ Nt . 

4. Compute W = BasicExt (Y',Z'). 
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We now have the following theorem. 

Theorem 5.14. For every constant i) > 0 there exists a constant Co > 1 such that for any n,k € N 
with n > Co and k > ^ log 2+ri n, if X = (X\, X 2 , ■ ■ ■ X t ), Y = (Yi, >2, ■ ■ ■ Y t ) are two independent 
(2k, 2k, ■ ■ ■ , 2k)-block sources where each block has n bits and t = |~^~| + 1, then 

\(BExt(X,Y),Y) - (U m ,Y)\ <e 

and 

\(BExt(X,Y),X)-(U m ,X)\<e, 

where m = 1.8 k and e = 2 _fcf2(1) . 

Proof. (Sketch) By Lemma 5.7, with probability 1 — 2over the fixing of Yi, there exists a 
subset T C [ N] such that |T| > |iV and MS C T with |S| = h, we have 

\(Z{,ieS)-U he \<2-W) 


We now want to apply Lemma 5.8. Again, we need to first make sure that the conditions of 
Lemma 5.7 and Lemma 5.8 are satisfied. As in the proof of Theorem 5.10, these conditions are 
satisfied if the following conditions are satisfied. 

k > 24/c 3 “ + ^ log n and l = k l3 > Ck a log n 

for some constant C > 1. 

Thus when k > \ log 2+,? n, a = o{2+F) ’ @ = W+F) ’ an< ^ ^ = 0-95??, we have that for sufficiently 
large n, 


k 

j^3ot-\-/3 


P~rM 1 jj 

fcW+iF > Cl(log i+ 6 n) > 24log n and 


k^ 

kP 


°tM -1 | jj, 

kW+t7) > 0(log 1+ 6 n ) > Clogn. 


Also note that £ < k/(2h) thus the output length in each iteration of the lightest bin protocol 
also satisfies the condition of Lemma 5.8. Thus we can apply that lemma. Note that we can first 
fix Yi, and conditioned on this fixing Z\ is a deterministic function of X\, and thus at the end of 
the first iteration of the lightest bin protocol, we can use Z\ to extract Z 2 from Y2. By Lemma 5.8, 
again with probability 1 — 2 -fcn(1) over the further fixing of X\ (and thus Z\), we will have that Z 2 
has the h-wise independent property. Moreover now Z 2 is a deterministic function of Y2 and thus 
at the end of the next iteration of the lightest bin protocol, we can use Z 2 to extract Z3 from X 2 . 
Thus, since in the algorithm we are applying the lightest bin protocol in an “alternating” manner, 
the whole algorithm works through as if we are dealing with independent sources. 

Note that the lightest bin protocol stops only if the number of rows in Zt is at most ^4-. Thus 

before the iteration stops, we always have N t > hp- > h 3 > h 2 . By Lemma 5.8 the probability of 

the “bad event” in each iteration is at most N t 2 < ( h?)~^/ 2 = 2 _fcn(1) . We now compute the 
number of iterations needed to decrease the number of rows from N± = poly(n) to ^44-- 
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2 

In each iteration the number of rows in Zt decreases from Nt to Nt +1 < Mr . When 

2 

N t > h/ 1 , we have that N/* >h 2 >^. Thus 

N t+1 < < N™. 

Therefore, as long as Nt > h^, in each iteration the number of rows in Z l decreases from Nt 

4 

to Nt+i < . Since initially we have N± = poly(n), the number of iterations needed to decrease 

the number of rows from JVi = poly(n) to hX* is at most c! which equals 


log 


M \Jh\ogh 


log N\ log log N\ — i log h — log log h 


\ log h — 2 

log log n + 0(1) — ^ log h — log log h ^ log log n 


log log n 

< . - 1 < 


\ log h — 2 

log log n 


— l 


— i 


< 


2 .t log h 
13.2(2 + n) 


Q( 2 2^ l°g ^g n 


2 log h — 2 
— 1 (since k > \ log 2+,? n) 


M2 + v) 

Once Nt < in the next iteration we have 


1 13.2 1 14 1 

- 1 <- 1 <- 1 . 

/r r] 


Nt+i < 


16 h 16 h 3 

ry2 ly t - 7 2 


l /?l3 

Thus the number of iterations needed to decrease the number of rows from N = poly(n) to 42g- 
is at most C3 = d + 1 < —, which is also a constant. Since 7 = ^ we have that in each Zt, 

the fraction of “good rows” is at least |(1 — j) C3 > |(1 — C37) > | • | > 1/2, which satisfies the 
requirement of Lemma 5.8. Also note that there exists a constant Co = Cq(t)) such that whenever 
n > Co and k > log 2 n we have h > k a > C\ where C\ is the constant in Lemma 5.8. Thus we are 
all good. Note that the number of blocks from (X , Y) used in the iteration is at most C 3 + 2. 

Finally, when we stop at step t, we can fix all previous blocks of ( X , Y) used in the algorithm 
except ( X',Y' ). Since the number of blocks is a constant, with probability 1 — 2 _fcn(1) over this 
fixing, we have that Zt -1 has the h- wise independent property as in Lemma 5.8. Moreover now 
Zt -1 is a deterministic function of X. Let Z , t _ 1 be the concatenation of the rows of Zt —1 with index 
in the output of the last lightest bin protocol. Note that Z' t _ 1 is a deterministic function of Zt —1 
and has at most MM rows. Without loss of generality, we can assume that Z[_ } has exactly 
rows, otherwise we can add rows of all 0 strings to it until this is achieved. This ensures that Z' t _ 1 
is a deterministic function of Zt -1 with a fixed output domain. Thus the size of Z' t _ l is bounded 

by = 0(k 3a+ P) = o{k 3+£). 

We now fix Z[_ 1 . By Lemma 3.7 with probability 1 — 2 _fcn(1) over the fixing of Z\_ j, we have 
that X' still has min-entropy 2 k — o{k 2+ ^) — kF 1 ^ = 2 k — o(k). Also, by Lemma 5.8 with probability 
1 — 2 -fcn(1) over the fixing of Zt -1 (and thus also the fixing of Z' t _ 1 ), we have that Zt has the the 
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h- wise independence property. Thus with probability 1 — 2 fcf2(1> over the fixing of Z' t _ j , we have 
that Zt is 2 _fcntl) -close to an SR-source. 

Moreover, conditioned on the fixing of Z' t _ x , we have that Z t is a deterministic function of Y'. and 
is thus independent of X'. Thus by Theorem 3.4, with probability 1 — 2 -fc ° (1) over the fixing of Zt, 
we have that Z' is 2 _fcn(1) -close to an SR-source (where each row has 1.9 k bits). Note that the size 
of Zt is also bounded by -kp-7 = 0(fc 3 “ +/3 ) = o(k 2 +rf Thus again by Lemma 3.7 with probability 

1—2 _fcn(1) over the fixing of Zt, we have that Y' still has min-entropy 2k— o(fcTT)—= 2 k—o(k). 
Moreover, conditioned on the fixing of Zt, we have that Z' is a deterministic function of X ', and is 

o 3/x 

thus independent of Y'. Note that the number of rows in Z' is at most = 0(k 3a ) = 0(k^Z+l a) 

^_o 3/x 2 Q 

and k W+i^> = k 2 +r > log n, thus by Theorem 3.5 we have that 

\{W,Y') - {U m ,Y')\ < 2~ kQW 

and 


\(W,Z')-(U m ,Z')\<2- knW , 

where m = 1.87c. Note that we have fixed all previously used blocks of (X, Y), and now Z' is a 
deterministic function of X'. Thus conditioned on the fixing of Z', we have that IT is a deterministic 
function of Y', and is thus independent of X. Therefore by adding back all the errors we also have 

\{W,Y)-{U m ,Y)\<2- knW 

and 


\{W,X)-(U m ,X)\<2~ knW . 

Finally, note that the number of blocks required in each block source is at most [ c,3 p 2 ] = 

rji + !- ■ 

Note that when n < Co, the extractor can be constructed in constant time just by exhaustive 
search (in fact, we can get a two-source extractor in this way). Thus, we have the following theorem 
(by replacing 2k with k). 


Theorem 5.15. For every constant 77 > 0 and all n,k € N with k > log 2+,? n, there is an efficiently 
computable extractor BExt : ({0,l} Tl ) t x ({0, l}™)* —> {0, l} m with t = |"|] + 1, such that if X = 
(X\, X 2 , ■ ■ ■ X t ),Y = (Lj, Y 2 , ■ ■ ■ Y t ) are two independent (k, k, ■ ■ ■ ,k)- block sources where each block 
has n bits, then 


|(BExt(X,T),T) — (U m ,Y)\ < e 

and 


where m 


|(BExt(X,T),X) — {U m ,X)\ < e, 
0.9 k and e = 2" fcn(1) . 6 


®The constant 0.9 can be replaced by any constant less than 1. 
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As a corollary, we immediately obtain the following theorem. 

Theorem 5.16. For every constant r] > 0 and all n,k £ N with k > log 2+r? n, there is an efficiently 
computable extractor lExt : ({0, l}™)* —> {0, l} m with t = |~hl] + 2 such that if X i,--- ,X t are t 
independent (n, k)-sources, then 

|IExt(A l5 • • • ,X t )-U m | <e, 

where m = 0.9 k and e = 2 _fcfi<1) . 


6 Conclusions and Open Problems 

In this paper we constructed an explicit extractor for three independent (n, k) sources with min- 
entropy k > log 12 n, and error e = 2 _fcf2(1> . In fact our extractor works for one (n,k) source and 
another independent (k, k ) block source. This improves the previously best known construction for 
general (n,k) sources in [Lil3a], and brings the construction of independent source extractors to 
nearly optimal. We also have improved results for the case of k > log 2+?? n for any constant r/ > 0, 
where we achieve a better constant-source extractor and in fact an extractor for two independent 
block sources with each having a constant number of blocks. As a by-product, we developed a 
general method to reduce the error in somewhere random sources from l/poly(n) to while 

keeping the number of rows to be poly(n), at the cost of one extra weak source. 

Our new results essentially subsume all previous results about independent source extractors, 
except in the case of two-source extractors. The natural next step is thus to try to break the 
entropy rate 0.49 barrier in Bourgain’s extractor [Bou05]. Another interesting direction is to use 
our techniques to build better two-source dispersers and Ramsey graphs, in the spirit of [BRSW06]. 
Finally, it would be interesting to see if the techniques developed recently by the author in [Lil3b, 
Lil3a] and here can be applied to the constructions of extractors and dispersers for other classes 
of sources, such as affine sources and small space sources. 
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